Ensure operators are always closed#13721
Conversation
|
I do not understand why in the previous code |
sopel39
left a comment
There was a problem hiding this comment.
How did you find the issue? Is it a regression? Would it be possible to write a test?
|
Yeah, it's a tricky one. I tried my best to explain it with words in the description, but I can see how it can still be difficult to understand. Let me include some pictures to better display the race. Let's assume a thread that was executing a task that throws has executed the At the same very moment some other thread calls But since the first thread throws and won't execute the post script: As a result a |
|
@sopel39 The issue is discovered when debugging flakiness from #11275. I was trying to write a test, but it become very cumbersome very fast, as it requires rather tricky interactions. To write a test I would need to stop one thread after executing |
losipiuk
left a comment
There was a problem hiding this comment.
Thank @arhimondr for images. This is super clear now. And indeed the fix looks valid.
f3a62ff to
4401c1f
Compare
There was a problem hiding this comment.
Not necessary. The Optional.of() construction will fail if it's null.
There was a problem hiding this comment.
I added this more as a documentation so the intention is clearly stated. Otherwise when some task starts returning null somebody may interpret the Optional.of as a mistake and switch it to Optional.ofNullable instead of fixing the actual problem.
Due to a race condition in Driver#tryWithLock there was a chance an operator might have end up not being properly closed upon completion. - Driver#process executes an operator under the Driver#exclusiveLock - An operator throws an exception and triggers a task failure - A TaskStateMachine listener closes all drivers calling Driver#close - Driver#close is not able to acquire the Driver#exclusiveLock and assumes the driver will be terminated by the lock owner - The lock owner throws an exception and never runs post execution code in Driver#tryWithLock that was expected to close the operators
4401c1f to
20037c9
Compare




Description
Due to a race condition in Driver#tryWithLock there was a chance an
operator might have end up not being properly closed upon completion.
assumes the driver will be terminated by the lock owner
execution code in Driver#tryWithLock that was expected to close
the operators
Fix
Core engine
N/ARelated issues, pull requests, and links
#11275
Documentation
(X) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
(X) No release notes entries required.
( ) Release notes entries required with the following suggested text: